109 research outputs found

    Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

    Full text link
    Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity. In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. To accomplish this, we formulate the conversion process as an optimization problem and provide a closed-form solution. We demonstrate how to transform the target equation into a Vandermonde linear system problem, which can be efficiently solved using the Discrete Fourier Transform (DFT). Notably, our method requires no training and maintains numerical stability. It can be also applied to any LongConv-based model. To assess its effectiveness, we conduct extensive experiments on language modeling tasks across various settings. Additionally, we compare our method to other gradient-descent solutions, highlighting the superior numerical stability of our approach. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.Comment: Accepted to EMNLP 2023. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversio

    Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes

    Full text link
    Unsupervised deep learning for optical flow computation has achieved promising results. Most existing deep-net based methods rely on image brightness consistency and local smoothness constraint to train the networks. Their performance degrades at regions where repetitive textures or occlusions occur. In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning. In particular, we investigate multiple ways of enforcing the epipolar constraint in flow estimation. To alleviate a "chicken-and-egg" type of problem encountered in dynamic scenes where multiple motions may be present, we propose a low-rank constraint as well as a union-of-subspaces constraint for training. Experimental results on various benchmarking datasets show that our method achieves competitive performance compared with supervised methods and outperforms state-of-the-art unsupervised deep-learning methods.Comment: CVPR 201

    A Projection-Based Approach for Distributed Energy Resources Aggregation

    Full text link
    Aggregating distributed energy resources (DERs) is of great significance to improve the overall operational efficiency of smart grid. The aggregation model needs to consider various factors such as network constraints, operational constraints, and economic characteristics of the DERs. This paper constructs a multi-slot DER aggregation model that considers the above factors using feasible region projection approach, which achieved the protection of DERs data information and the elimination of internal variables. A system economic dispatch (ED) model is established for the operators to make full use of the DER clusters. We calculate the feasible regions with temporal coupling by extending the Progressive Vertex Enumeration (PVE) algorithm to high dimension by the Quickhull algorithm. Finally, an IEEE 39-bus distribution network is simulated with DERs to verify the effectiveness of the proposed model. Results show that the two-step ED derives the same results as the centralized ED

    Image-based Geolocalization by Ground-to-2.5D Map Matching

    Full text link
    We study the image-based geolocalization problem, aiming to localize ground-view query images on cartographic maps. Current methods often utilize cross-view localization techniques to match ground-view query images with 2D maps. However, the performance of these methods is unsatisfactory due to significant cross-view appearance differences. In this paper, we lift cross-view matching to a 2.5D space, where heights of structures (e.g., trees and buildings) provide geometric information to guide the cross-view matching. We propose a new approach to learning representative embeddings from multi-modal data. Specifically, we establish a projection relationship between 2.5D space and 2D aerial-view space. The projection is further used to combine multi-modal features from the 2.5D and 2D maps using an effective pixel-to-point fusion method. By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps. Additionally, we construct the first large-scale ground-to-2.5D map geolocalization dataset to validate our method and facilitate future research. Both single-image based and route based localization experiments are conducted to test our method. Extensive experiments demonstrate that the proposed method achieves significantly higher localization accuracy and faster convergence than previous 2D map-based approaches
    • …
    corecore